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Abstract. We consider the fc-core decomposition of network models and In- 
ternet graphs at the autonomous system (AS) level. The fc-core analysis allows 
to characterize networks beyond the degree distribution and uncover structural 
properties and hierarchies due to the specific architecture of the system. We 
compare the fc-core structure obtained for AS graphs with those of several net- 
work models and discuss the differences and similarities with the real Internet 
architecture. The presence of biases and the incompleteness of the real maps 
are discussed and their effect on the fc-core analysis is assessed with numerical 
experiments simulating biased exploration on a wide range of network mod- 
els. We find that the fc-core analysis provides an interesting characterization 
of the fluctuations and incompleteness of maps as well as information helping 
to discriminate the original underlying structure. 

1. Introduction. In recent times, mapping projects of the World Wide Web (WWW) 
and the physical Internet have offered the first chance to study topology and traffic 
of large-scale networks. The study of large scale networks, however, faces us with 
an array of new challenges. The definitions of centrality, hierarchies and structural 
organizations are in particular hindered by the large size of the systems and the 
complex interplay of engineering, traffic, geographical, and economical attributes 
characterizing their construction. 

In this paper we propose the fc-core decomposition as a graph analysis tool able 
to highlight interesting structural properties that are not captured by the degree 
distribution or other simple topological measures. The fc-core decomposition [43 [ 
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[8j [5] consists in identifying particular subsets of the network, called /c-cores, each 
one obtained by a recursive pruning strategy. The fc-core decomposition therefore 
provides a probe to study the hierarchical properties of large scale networks, focusing 
on the network's regions of increasing centrality and connectedness properties. More 
central cores are indeed more strongly connected, with larger number of possible 
distinct paths between vertices: this allows to obtain not only more robust routing 
properties but also a better opportunity to find a path with specific Quality of 
Service (QoS). 

Here we study a set of basic network models and the AS level Internet maps 
obtained in two large scale measurement projects using very different techniques. 
We first characterize the fc-core structure of real Internet maps and compare with 
the structure obtained in the various models. We find that the fc-core structure is 
extremely different in light tailed and heavy-tailed networks and is able to clearly 
discriminate among various models presented in the literature. In this perspective 
the fc-core analysis represents a useful tool in the model validation process. More- 
over, any result concerning Internet maps has to consider their incompleteness and 
the presence of measurements biases. For this reason we also present a study of 
the stability of the fc-core analysis in the presence of biases and incomplete sam- 
pling in all the network models considered. Our findings indicate that the fc-core 
decomposition's fingerprints allow the discrimination between heterogeneous and 
homogeneous topologies even after an incomplete sampling: this shows that the 
signatures observed in the AS Internet maps are qualitatively reliable, even if some 
biases are unavoidable at a detailed quantitative level. 



2. Related work. In the last years, a wealth of studies have focused on the large 
scale structure and heterogeneities of networked structure of practical interest in 
social science, critical infrastructures and epidemiology [Tl|T7l|40]. The Internet has 
been readily considered as a prototypical example of complex network by the sci- 
entific community and starting with the seminal paper by Faloutsos, Faloutsos and 
Faloutsos [21] an impressive number of papers has dealt with the characterization 
of its large scale properties and hierarchies PH SH 001 031 [3T1 1^ . While the initial 
interest has been focused on the general principles leading to the basic organization 
features of complex networks, the research activity is now diving into system spe- 
cific features that distinguish and highlight the various forces and/or engineering 
at work in each class of networks. This is a particular pressing need in the Internet 
where even at the Autonomous System (AS) level the large scale self-organization 
principles are working along with economical and technical constraints, optimiza- 
tion principles and so on [29[ 119] . In addition actual Internet maps are not free from 
errors and measurement biases. For this reason, recent works have been devoted to 
a better understanding of the possible sources of errors and biases presented by the 
experimental data [27lllTJ[Tni[ini[2Sl[I31ll7]- Since Internet maps are typically based 
on a sampling of routes between sources and destinations (obtained by tools such 
as traceroute), these studies have dealt with simplified models of traceroute-like 
sampling, applied to graphs with various topological properties. They have shown 
that, except in some peculiar cases [13], the sampling process allows to distinguish 
qualitatively between networks with strongly different properties (homogeneous vs. 
heterogeneous), while a quantitative and detailed view of the network may suffer 
important biases. 
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Here, we consider the use of the fc-core decomposition as a probe for the structure 
of Internet maps. The fc-core decomposition has mostly been used in biologicaUy 
related contexts, where it was applied to the analysis of protein interaction net- 
works or in the prediction of protein functions [3l [50] . An interesting application 
in the area of networking has been provided by Gkantsidis et al. f24j and Gaertler 
et al. 23J, where the fc-core decomposition is used for filtering out peripheral Au- 
tonomous Systems (ASes) in the case of Internet maps. The fc-core decomposition 
has also recently been used as a basis for the visualization of large networks, in par- 
ticular for AS maps [7l [U [28] . Finally, recent works using the fc-core analysis have 
focused on the analysis of the Internet maps obtained by the DIMES project [32] , 
In Ref.s [m [12] , an approach based on the fc-core decomposition has been used to 
provide a conceptual and structural model of the Internet, the so-called Medusa 
model for the Internet. Up to now, no study has however considered the fc-core 
decomposition of the various commonly used models for complex networks, nor 
compared it to the one of real-world networks. Subramanian et al. 45J have pro- 
posed to classify ASes in five different levels or "tiers", and given a method to 
extract this classification from the AS directed graph. This method can however 
lead to some biases when the knowledge of the all peer-to-peer relationships is not 
complete. The fc-core decomposition studied in this paper considers on the other 
hand undirected networks, and yields a finer hierarchy, not based on the commercial 
relations between vertices, and in which the number of levels is not fixed a priori 
but depends on the characteristics of the network. It is moreover not restricted to 
AS maps but can be applied as well for example to Internet router maps or more 
generally to any real or computer generated graph. 



3. k-core decomposition. Let us consider a graph G = {V, E) of \V\ = n vertices 
and \E\ = e edges, the definition from [5] of fc-cores is the following 

Definition 1: A subgraph H = {C,E\C) induced by the set C C is a k-core 
or a core of order fc if and only if the degree of every node w G C induced in H is 
greater or equal than fc (in symbolic form, this reads Vt; € C : degree^ (w) > fc), 
and H is the maximum subgraph with this property. 

A fc-core of G can therefore be obtained by recursively removing all the vertices 
of degree less than fc, until all vertices in the remaining graph have degree at least 
fc. It is worth remarking that this process is not equivalent to prune vertices of a 
certain degree. Indeed, a star-like subgraph formed by a vertex with a high degree 
that connects many vertices with degree one, and connected only with a single edge 
to the rest of the graph, is going to belong to the first shell no matter how high is 
the degree of the vertex. We will also use the following definitions 

Definition 2: A vertex i has shell index fc if it belongs to the fc-core but not to 
(fc -I- l)-core. 

Definition 3: A k-shell Sk is composed by all the vertices whose shell index is 
fc. The maximum value fc such that Sk is not empty is denoted fcmax- The fc-core is 
thus the union of all shells Sc with c> k. 

Definition 4: Each connected set of vertices having the same shell index c is 
a cluster Q'^, where the corresponding set of edges are those connecting vertices 
of the cluster. Each shell Sc is thus composed by clusters Q^, such that Sc = 
Ui<m<qgj^^Qm; whcrc gj^i^x thc number of clusters in Sc. 
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Figure 1 . Sketch of the fc-core decomposition for a small graph. 
Each closed line contains the set of vertices belonging to a given 
k-core, while different types of vertices correspond to different k- 
shells. 

The k-core decomposition therefore identifies progressively internal cores and 
decomposes the networks layer by layer, revealing the structure of the different 
/c-shells from the outmost one to the most internal one, as sketched in Fig. [T] 

It is worth to note that the A:-core decomposition can be easily implemented: the 
algorithm by Batagelj and Zversnik [6] presents a time complexity of order 0{n + e) 
for a general graph. This makes the algorithm very efficient for sparse graphs, where 
e is of order n. 

A very interesting feature of the fc-cores concerns their connectivity properties. 
It has been for example shown experimentally in 12J that the fc-cores of the AS map 
obtained by the DIMES project [52] are fc-connected, which means that k disjoint 
paths are available between any two vertices belonging to the k-core. In fact, for any 
two vertices u and v of the network, with shell indices respectively Cu and c^, there 
are (with some exceptions for small values of c„ and c„) at least min(c„, c„) disjoint 
paths between u and v |12j. Such property has important practical consequences 
since it implies larger and larger robustness and routing capacities for more and 
more central cores. The knowledge of such capacities identifies a very important 
hierarchy of ASes that could be taken advantage of by newly created ASes in order 
to choose to which other ASes to establish connections. We will come back to this 
point in section l4.1.3l 

4. A;-core structure of Internet maps and models. 

4.1. Internet AS maps. In this section, we inspect Internet maps at the AS level 
and compare their fc-core structure with the insights obtained from models. In order 
to obtain Internet connectivity information at the AS level it is possible to inspect 
routing tables and paths stored in each router (passive measurements) or directly 
ask the network with a software probe (active measurements). In the following 
we consider data from two recent large scale Internet mapping projects using an 
active measurement approach. The skitter project at CAIDA [14] has deployed 
several strategically placed probing monitors using a path probing software. All 
the data are then centrally collected and merged in order to obtain Internet maps 
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that maximizes the estimate of cross-connectivity. The second set we consider 
is provided by the Distributed Internet Measurements and Simulations (DIMES) 
project [3HI33|. At the time where the map was obtained, the project consisted 
of more than 5,000 measuring agents performing Internet measurements such as 
traceroute and ping. Table [T] displays a summary of the basic properties of the 
considered Internet maps. We have also investigated Internet maps obtained from 
the Oregon Routeviews project 38J (not shown), with very similar results. In the 
following we show how the application of the /c-core decomposition can shed light 
on important hierarchical properties of Internet graphs, focusing on the AS maps 
obtained by each project in 2005. 

The first observation about the structure of the /c-cores is that they remain 
connected. This is not a priori an obvious fact since one can easily imagine networks 
whose fc-core decomposition yields several connected components corresponding, 
e.g. to various communities. Instead, each decomposition step is just peeling the 
network leaving connected the inner part of the network, showing a high hierarchical 
structure, i.e. the most connected part of the network is also the most central. 
Figure [2] displays the size in terms of vertices of each fc-shell as a function of its 
index. As for RSF or BRITE networks (see section W?]\ . power-law like shapes are 
obtained. Important fluctuations appear at large fc, which is not very surprising 
since such shells of large index are relatively small, except for the most central 
core which contains 50 vertices at kmax — 26 and 82 vertices at kmax — 39 for 
CAIDA and DIMES, respectively. Such a structure has also been observed in the 
independent study of [12]. 



source 


n 


e 


(d) 


dmax 


^max 


CAIDA, 2005/04 


8542 


25492 


5.97 


1171 


26 


DIMES, 2005/05 


20455 


61760 


6.04 


2800 


39 



Table 1. Main properties of the Internet maps considered in the 
present study: number of vertices n and of edges e, average degree 
(d), maximum degree dmax and maximum shell index /cmax- 



Interestingly, a much larger kmax is obtained for the DIMES AS map than for 
the CAIDA one. It is likely that such discrepancy is linked to the diversity of the 
exploration methods. The maximum core depends indeed largely on the amount of 
discovered edges and lateral connectivity. The set of "observers" is 22 for CAIDA 
but more than 5, 000 for DIMES. It is therefore reasonable that the latter has more 
probability to discover edges, and therefore a larger value of fcmax- 

4.1.1. Self- similarity. The properties of the successive fc-cores of Internet maps can 
be studied by considering their degree distributions and correlation properties. 

Figure [3] shows the cumulative degree distribution for the first fc-cores, for the 
various AS maps. Strikingly, the shape of the distribution, i.e. an approximate 
power-law, is not affected by the decomposition. This is illustrated by the fact that 
the data for the various distributions collapse on top of each other, once the degree is 
rescaled by the average degree of the fc-core. Note that in Fig. [31 as in the following 
figures, we do not show data for all the cores, but only for a representative set of k 
values. Figure [3] clearly shows how the exponent of the power-law is robust across 
the various fc-cores, although the range of variation of the degree decreases. In 
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Figure 2. Shell size as a function of their index for the AS maps. 
The dashed line is a power-law cx /c"^ ''. 



other words, each core conserves a broad degree distribution: AS with significantly 
different number of neighbors are present in each core or hierarchy level. 




Figure 3. Rescaled cumulative degree distributions of some k- 
cores of the AS Internet maps. The degree is normalized by the 
corresponding average degree (d) in each fc-core. The shapes of the 
distributions are preserved by the successive pruning, pointing to 
a self-similar behavior of the successive fc-cores. 



In order to better characterize and check this self-similarity, we have computed 
also the two and three points correlations functions of the various /c-cores. A useful 
measure to quantify correlations between the degrees of neighboring vertices is the 
average degree of nearest neighbors dnn{d) of vertices of degree d |39j : 

dnn{d) = ^ J- ^ di , (1) 

where V{j) is the set of the dj neighbors of vertex j and rid the number of vertices 
of degree d. This last quantity is related to the correlations between the degree of 
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connected vertices since on the average it can be expressed as 



(2) 



d' 



where P{d'\d) is the conditional probabiUty that a vertex with degree d is connected 
to a vertex with degree c?'. If degrees of neighboring vertices are uncorrelated, 
P{d'\d) depends only on d' and thus dnn{d) is a constant. When correlations are 
present, two main classes of possible correlations have been identified: assortative 
behavior if (i„„(c?) increases with d, which indicates that large degree vertices are 
preferentially connected with other large degree vertices, and dis assortative if d„„((i) 
decreases with d ^37) . From a routing point of view, a disassortative behavior cor- 
responds to a network structure where vertices with small degree are preferentially 
connected to the hubs (i.e., large degree vertices). A second, and often studied, 
relevant quantity is the clustering coefficient [48j that measures the local group co- 
hesiveness and is defined for any vertex j as the fraction of connected neighbors of 
j 



where niink is the number of links between the dj neighbors of j. The study of the 
clustering spectrum cc{d) of vertices of degree d, defined as 



allows, e.g. to uncover hierarchies in which low degree vertices belong generally to 
well interconnected communities (high clustering coefficient), while hubs connect 
many vertices that are not directly connected (small clustering coefficient). Large 
clustering has a clear relevance for routing purposes since it indicates the presence 
of alternative paths thanks to the presence of many triangles: if a link from a vertex 
u to a neighbor v goes down, the message can be sent from u to w through a common 
neighbor. 

Figure [4] shows that not only the degree distribution but also the clustering and 
correlations structures of the Internet maps are essentially preserved as the more 
and more external parts of the network are pruned. We note however that, as also 
shown in [T^], the largest /c-cores are no more scale- free: since they are very densely 
connected, their degree distribution is rather peaked around an average value and 
their topology is closer to that of a random graph with large average degree. 

In summary, the AS networks exhibit a statistical scale invariance with respect to 
the pruning obtained with the fc-cores decomposition for a wide range of k. Indeed, 
while this decomposition identifies subgraphs that progressively correspond to the 
most central regions of the network, the statistical properties of these subgraphs 
are preserved at many levels of pruning. This hints to a sort of global self-similarity 
for regions of increasing centrality of the network, and to a structure in which each 
region of the Internet as defined in terms of network centrality has the same proper- 
ties than the whole network. This is particularly interesting since the properties of 
Internet (heterogeneous degree distributions, correlations, clustering...) have been 
up to now studied at the level of the whole map, while one can be interested to 
restrict the analysis to some particular regions of the map, focusing for example on 
parts of the network with certain routing capabilities (QoS, failure support). At 
a general level, the fc-core decomposition appears therefore as a suitable way to 



ccj = 2 • niink/ (djidj - 1)) , 



(3) 




(4) 
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define a pruning procedure equivalent to a scale-change preserving the statistical 
properties of graphs while focusing on their more and more connected parts. 
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Figure 4. Average nearest neighbor (top) and rescaled clustering 
spectrum (bottom) as a function of the degree for some fc-cores 
of the AS Internet maps. All the quantities are rescaled by the 
corresponding averages in each fc-core. The collapse of the various 
curves confirm the self-similar structure of the fc-cores. 



4.1.2. Shell index and centrality. The identification of the most central vertices is a 
major issue in networks characterization [22' . While a first intuitive and immediate 
measure of the centrality of vertices is given by their degree, more refined investiga- 
tions are needed in order to characterize the real importance of various vertices: for 
example, some low-degree vertices may be essential because they provide connec- 
tions between otherwise separated parts of the network. In order to uncover such 
important vertices, the concept of betweenness centrality (BC) is now commonly 
used [22l [36] . The betweenness centrality of a vertex v is defined as 

9(v) = > , (5) 

s^t 

where ast is the number of shortest paths going from s to i and ast (v) is the number 
of shortest paths from s to i going through v. This definition means that central 
vertices are part of more shortest paths within the network than peripheral vertices. 
Moreover, the betweenness centrality gives in transport networks an estimate of the 
traffic handled by the vertices, assuming that the number of shortest paths is a 
zero-th order approximation to the frequency of use of a given vertex {e.g. the load 
of an AS), in the case of an all-to-all communication. 

The fc-core decomposition intuitively provides a hierarchy of the vertices based 
on their shell index that is a combination of local and global properties, {e.g., [50] 
shows that the shell index is a better criterium for centrality than the degree in 
protein interaction networks). In this perspective, it becomes very interesting to 
study the correlation between the degree, the betweenness centrality and the shell 
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shell index shell index 



Figure 5. Average betweenness centrality as a function of shell 
index (bottom) , and average shell index as a function of the degree 
(top), for the AS Internet maps. A clear correlation between these 
quantities is observed, although strong fluctuations are present. 

index of a vertex in order to quantify the statistical level of consistency of the 
various measures. We show in Fig.[5]the average betweenness centrality (computed 
on the original graph) of vertices as a function of their shell index, and the shell 
index as a function of the degree d. A strong correlation is expected, but the 
fluctuations observed (given by the errorbars) should not be a surprise: while a 
low-degree vertex has clearly low shell index, large or medium degree vertices do 
not have necessarily a large shell index. In the AS maps, we observe in fact that 
all large degree vertices belong to the most central core, while large fluctuations 
are observed for intermediate degree values. Moreover, the betweenness centrality 
is a highly non-local quantity which can be large even for small-degree vertices. 
These quantities are thus pinpointing different kinds of centrality. The shell index 
appears therefore as a very interesting quantity to uncover central vertices and it 
has the advantage of a much faster computation time than those required for the 
betweenness centrality (of order n^logn [9j). 

4.1.3. Potential practical implications. The fc-core decomposition has interesting 
immediate applications. First of all, as already mentioned in section [3l it has been 
shown in ref [Hj that each /c-core of the DIMES AS map is fc-connected, and that 
the number of disjoint paths between two vertices u and v of this map is bounded 
from below by the minimum of the shell indices of u and v. 

Moreover, it is quite easy to show and understand that similar properties hold 
for a network under certain assumptions. In particular, if the central core (of shell 
index kmax) of a given network is fc„ia2:-edge-connected, and if there exists enough 
edges between the various shells (in particular if any cluster -see Def. 4- of each 
fc-shell is connected to the k + 1-shell by at least k edges), then each /c-core of 
the network turns out to be fc-edge connected. We have in fact checked that these 
conditions are verified for the CAIDA and DIMES maps as well as for the network 
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models under study. Note that fc-edge connectivity {i.e. the existence of k distinct 
paths which do not share any common edge) is less restrictive than fc-connectivity. 
In the context of Autonomous Systems and evaluation of routing capacities or of 
failure possibilities however, it is particularly relevant since a vertex of the AS map 
represents in fact many different routers, so that different paths may cross at a 
given AS while being effectively physically disjoints. 

Such connectivity properties highlight the fact that the fc-core decomposition 
provides a natural definition for a hierarchy in the network, in which the more 
central vertices (with larger shell index) have better routing capabilities {i.e. they 
can choose several paths to achive a certain connection) , and each fc-core constitutes 
an ensemble of ASes able to provide a certain QoS, with global larger robustness 
for larger fc. 

It is therefore interesting to compare the fc-core decomposition with the tiers 
hierarchy proposed by Subramanian et al. I45j . These two hierarchies have different 
origins and motivations: on the one hand, the tiers classification is based on the 
inference of AS commercial relationships; on the other hand, and in a somehow op- 
posite point of view, the fc-core decomposition gives a classification of the network's 
vertices which does not have an a priori fixed number of classes or levels, but which 
adapts itself to the situation of the network. Moreover, the shell index of a vertex 
is not fixed once and for all but may fluctuate in time due to possible connectiv- 
ity changes (as investigated in the next section). In this aspect, such a hierarchy 
provides very relevant information about the state of the network at a given time. 
While the actual routing protocols do not take advantage of such information, one 
could imagine that future routing protocols may be able to exploit it. 

We finally note that the use of the fc-core decomposition in order to find a certain 
hierarchy of connectedness properties is not limited to the analysis of AS maps: it 
can as well be applied to other kinds of Internet maps, for example at the router 
level, or to any communication or transportation network. 

4.2. fc-core structure of network models. In order to better understand the 
properties of the fc-core decomposition of networks and use it as a model valida- 
tion tool, we also apply this technique to a set of well known and commonly used 
models of networks, whose main characteristics are summarized in Table [2l Vari- 
ous topological properties can lead to various decompositions so we consider both 
homogeneous and heterogeneous networks. For each model, we will present results 
corresponding to one random instance of the model, and have checked that the 
highlighted properties do not depend on the particular instance considered. 



source 


n 


c 


(d) 


dmax 


^max 


ER 


10" 


10" 


20 


41 


14 


BA 


5.10* 


99998 


4 


642 


3 


Weibull 


lO'' 


307500 


6.15 


377 


9 


RSF 7 = 2.3 


97315 


293891 


6.04 


938 


22 


BRITE 


lO'' 


156145 


3.63 


433 


54 


INET 3.0 


10000 


19676 


3.936 


984 


8 



Table 2. Main properties of the models considered in the present 
study: number of vertices n and of edges e, average degree (d), 
maximum degree dmax and maximum shell index fcmax- 
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4.2.1. Size of shells. We first consider for reference tiie random grapii model of 
Erdos and Renyi (ER) ^20j. wliicli is the most standard example of graphs with a 
characteristic value for the degree (the average value (d)). In this case, the maximum 
index is clearly related to the average degree (d) . The vertex degrees have only small 
fluctuations, thus most vertices belong to the same fc-core that is also the highest. 
Noticeably, the size of the shells is increasing with the index, showing that only 
few vertices can be considered as peripherical (see Fig. |6]), and that the network 
contains no clear hierarchy between nodes. 

A second model we considered is the Barabasi-Albert (BA) model [4] that has 
been put forward to exemplify the concept of preferential attachment and as a 
paradigm of dynamically evolving networks. In this model, a growing network is 
constructed according to the preferential attachment mechanism: each new vertex 
is connected to m already existing vertices chosen with a probability proportional 
to their starting degree. This model produces graphs with power-law degree distri- 
butions, thus characterized by a very large variety of degree values. On the other 
hand, this is a toy model that should not be considered as a realistic model in 
the Internet and indeed the corresponding fc-core decomposition is somehow trivial, 
with only few shells at very small index. The construction mechanism provides a 
simple explanation. Each new vertex enters the system with degree m, but at the 
following time steps new vertices may connect to it, increasing its degree. Inverting 
the procedure, we obtain exactly the fc-core decomposition. The minimum degree 
is m, therefore all shells Cc with c < m are empty. Recursively pruning all vertices 
of degree m, one first removes the last vertex, then the one added at the preceding 
step, whose degree is now reduced to its initial value m, and so on, up to the initial 
vertices which may have larger degree. Hence, all vertices except the initial ones 
belong to the shell of index m. 

Other algorithms are widely used to obtain random graphs with prescribed 
broad degree distributions. In the literature, different definitions of heavy-tailed 
like distributions exist. While we do not want to enter the detailed definition, 
we have considered two classes of such distributions: (i) scale-free or Pareto dis- 
tributions of the form P(fc) ~ fc"'^ (RSF), and (ii) WeibuU distributions (WEI) 
P{k) = (a/c)(fc/c)°~^ exp(— (fc/c)"). The scale-free distribution has a diverging 
second moment and therefore virtually unbounded fluctuations, limited only by 
eventual size-cut-off. The Weibull distribution is akin to power-law distributions 
truncated by an exponential cut-off which are often encountered in the analysis of 
scale-free systems in the real world. Indeed, a truncation of the power-law behavior 
is generally due to finite-size effects and other physical constraints. Both forms have 
been proposed as representing the topological properties of the Internet [10]. We 
have generated the corresponding random graphs by using the algorithm proposed 
by MoUoy and Reed [Ml [3S] : the vertices of the graph are assigned a fixed sequence 
of degrees {h}, i = 1, . . . ,N, chosen at random from the desired degree distribu- 
tion P(fc), and with the additional constraint that the sum ^^ki must be even; 
then, the vertices are connected by J2i edges, respecting the assigned degrees 
and avoiding self- and multiple-connections. The parameters used are a — 0.4 and 
c = 0.6 for the Weibull distribution, and 7 = 2.3 for the RSF case. 

The previous construction can be considered as static as it does not imagine a 
dynamical attachment rule. The topology generator INET3.0 [49] also falls into this 
class. This generator has been specifically designed to represent the Internet at the 
AS level by obtaining a closely similar topology. As shown in Fig. [6l such network 
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Figure 6. Shell size as a function of their index for the various 
models considered. The various models yield very different shapes, 
indicating the fc-core decomposition as an interesting additional 
tool for network characterization. 

presents a small number of fc-cores, with a shell size behavior that is exponentially 
decreasing for increasing shell index. 

Another Internet topology generator often discussed in the literature is the 
BRITE generator which proposes a growth mechanism combining the addi- 
tion of vertices with m new links according to the preferential attachment with the 
addition of new links between already existing vertices, also through a preferential 
attachment mechanism. In this case, a non-trivial structure of shells is obtained, 
with a largest shell index kmax much larger than the average degree, and a shell size 
decreasing as a power-law function of the index. This implies a similar power-law 
relation between the size of each fc-core and its index, as observed in real Internet 
maps. At large k, large fluctuations are observed, with a relatively large central 
core (see Fig. The difference between BRITE and BA networks highlights the 
structural relevance of the addition of new links between already existing vertices 
in a growing heterogeneous network model. 

In general, as shown for an example in Fig. [5] (and with the exception of the 
BA model), random networks with heavy-tailed degree distributions present sys- 
tematically a large number of shells (we have also checked that kmax increases if 7 
decreases) , and much larger than the average degree (d) . The shell size is decreasing 
as a power-law of the index [HI |25] , with a quite large central core of index kmax , as 
for BRITE. On the contrary, WeibuU distributed networks have relatively few shells 
with a much smaller kmax- It is interesting that networks with relatively similar de- 
gree distributions can present in fact strongly different fc-core decompositions. This 
points to the fc-core decomposition as a supplementary valuable tool for network 
investigation. 

4.2.2. Core statistics and structure. In this paragraph, we compare the character- 
istics of the different cores, i.e. of more and more central parts of the network. In 
the following we will focus only on the models that have a core structure resembling 
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Figure 7. Cumulative degree distribution of some fc-cores for 
some model networks. For each fc-core, the degree is normalized 
by the average degree of the core. For these various models, the 
collapse of the various distributions show a striking property of 
statistical self-similarity of the successive fc-cores. 



that of the Internet as the ER and the BA models are readily ruled out as possible 
candidates to represent the Internet. 

Figure [7] shows the cumulative degree distribution for some fc-cores, for some of 
the studied models; namely, the probability P> {d) that any vertex in the networks 
has a degree larger than d. Strikingly, the shape of the distribution (power-laws 
or WeibuU) is not affected by the decomposition. This feature, already noted in 
|18j for uncorrelated scale- free networks, points to a striking property of statistical 
self-similarity of the generated /c-cores, which resemble one with each other under 
the opportune rescaling of the average degree. 

As in the case of Internet maps, we characterize further this self-similarity by 
computing the 2 and 3 point correlations as defined by the average degree of nearest 
neighbors dnn{d) of vertices of degree d, and the clustering spectrum cc{d) of vertices 
of degree d. These quantities are reported in Fig.s[8]and[9]for the various fc-cores. 
Strikingly, the behavior of the two quantities is preserved in all cases as the network 
is recursively pruned of its low-degree vertices. In other words, the overall network 
topology is invariant for fc-cores of increasing centrality. 

4.2.3. Summary. In summary, the fc-core decomposition allows to uncover very dif- 
ferent behaviors for different models which may otherwise share e.g. very similar 
degree distributions. The fc-core decomposition is therefore a useful tool in the 
context of the model validation process. For example, a growing network obtained 
with the linear preferential attachment rule may have a scale-free distribution of 
degrees P(fc) ~ k~'^ but will have a trivial shell structure because of its construc- 
tion mechanism. On the other hand, randomly constructed scale-free networks, 
which may have weak correlation properties and small clustering, can present a 
rich hierarchical decomposition with a large central core of high shell index. This 
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Figure 8. Nearest neighbors degree distribution of some fc-cores, 
rescaled by the corresponding average values, for some model net- 
works. The degree of each node is normalized by the average de- 
gree of each fc-core. The data collapse confirms the statistical self- 
similarity of the cores. 
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Figure 9. Clustering coefficient spectrum of some fc-cores for some 
model networks. The degree of each node is normalized by the av- 
erage degree of each fc-core, and the clustering coefficient is rescaled 
by the average clustering of each fc-core. Once again, a collapse is 
observed, confirming the self-similarity of the fc-cores. 



appears in agreement with the results of Ref. [30] where structural correlations and 
constraints appear to be sufficient to determine most of the observed statistical 
properties observed in large scale graphs. 
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id) 


dmax 


^max 


2001/05 


7400 


24791 


6.700 


1820 


28 


2002/03 


8489 


28871 


6.802 


2007 


32 


2003/05 


8755 


27300 


6.236 


1560 


26 


2004/04 


9238 


28016 


6.065 


1406 


26 


2005/04 


8542 


25492 


5.969 


1171 


26 



Table 3. Characteristics of the CAIDA AS maps considered for 
the time analysis: number of vertices n and of edges e, average 
degree (d), maximum degree dmax and maximum shell index fcmax- 



5. A;-cores, dynamics and sampling biases. 

5.1. Temporal variations of the fc-core structure. The availability of data 
obtained by the various projects makes it possible to study the temporal evolution 
of the Internet maps. We have considered the maps obtained by the CAIDA project 
at various times between 2001 and 2005. Table [3] shows the main characteristics of 
the analyzed maps, each of which was obtained from the archives of one complete 
month. 

While statistical signatures such as degree distribution, disassortative behavior 
and clustering spectrum are typically very stable over time, the fc-core structure 
analysis reveals some finer variations. For example, the number of vertices and 
edges and the maximal shell index fluctuate in the CAIDA maps. This can be 
tracked down to the fact that the number of sources used by CAIDA changes (14 
for the 2001/05 map, 21 for 2002/03, 24 for 2003/05 and 2004/04, and 22 for 
2005 /04) , and that the locations of some of these sources also change. 

Interesting informations also arise from the study of the change in the composi- 
tion of the various fc-shells: we show as an example in Fig. [10] the probability for a 
given AS to change from a shell of index x in a map obtained at a given time to a 
shell of index y in the successive map. While most vertices do not change their shell 
index, as shown by the dark area around the diagonal, some suffer an important 
change of status, from a highly central shell to a peripherical one or vice- versa. This 
highlights the presence of strong structural fluctuations in the evolution of CAIDA 
AS maps. 

A further fingerprint of such structural changes is provided by the analysis of the 
shell index of vertices that appear in or disappear from the maps between one snap- 
shot and the other, as shown in Fig[TlJ vertices in all shells, even central ones, disap- 
pear from the CAIDA maps even in the most recent maps, between 2004 and 2005. 
The fluctuations observed in the shell index of ASes may be related to three factors. 
A first one is the natural evolution of the Internet structure. A second factor is the 
re- numbering of the ASes for administrative reasons (see http://www.iajia.org). 
A third factor is the uncertainty and bias in the data collection. In this respect, 
CAIDA maps seem to exhibit a high level of instability, indicative of a mapping 
process less stable in time. In this context, the fc-core analysis appears as an inter- 
esting tool to highlight the temporal changes of the Internet structure as well as the 
measurement reliability in each particular experimental set-up, at an intermediate 
level between global quantities and local ones such as the degree. It will certainly 
be of interest in the future to study similar data for evolving DIMES maps, which 
are obtained with a much larger set of sources. 
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Figure 10. The grayscale code gives the probabihty of a change 
in shell index, from the CAIDA map of 2004/04 {x axis) to the one 
of 2005/04 {y axis). The points in line correspond to ASes that 
are present in 2004 but not in 2005, and the column corresponds 
to the reverse situation. Most nodes do not change shell index, 
as the dark area around the diagonal shows, but some important 
changes occur, with central nodes becoming peripherical, or vice- 
versa. 

5.2. Sampling biases. In this paragraph, we perform a sensitivity analysis of the 
fc-core decomposition with respect to potential sampling biases. In particular we 
want to assess the effect of incompleteness and sampling biases on the resulting 
structure of sampled graphs. For this reason we will produce incomplete synthetic 
sampling processes of network models and compare the fc-core structure of the 
sampled graph with that of the original one. 

Internet maps are currently obtained through sampling methods of the real In- 
ternet, which are based on a merging of paths between sources and destinations, 
obtained either through Border Gateway Protocol routing tables or through ac- 
tive traceroute measurements. Such sampling processes present possible sources 
of errors and biases whose effect has been up to now studied essentially for the 
degree distributions [27l IHl HSl [161 [131 [Ml- The analysis of idealized sampling 
processes on networks with various topologies has in particular revealed that the 
broadness of the degree distributions observed in Internet maps is a genuine feature, 
although important biases can remain on the exact form of the distribution, due to 
an undersampling of vertices with small degree. Moreover, although a path-based 
sampling process can produce a heterogeneous graph out of an homogeneous initial 
network (such as an ER graph), as rigorously shown in [T3], this is restricted to 
the case of a single source probing. It is therefore interesting to note that a single 
source traceroute-like probing of any network yields essentially a tree, whose k- 
core decomposition is by definition trivial (with k^ax = !)• Another obvious but 
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Figure 11. Probabilities that the vertices entering (IN nodes) or 
disappearing from (OUT nodes) the CAIDA maps have sheU index 
c. Note how even nodes with large shell index disappear from one 
year to the next. 



important confirmation regards the largest shell index: by definition, a sampling 
cannot discover paths or edges that do not exist, so that the maximal shell index 
of a network, k^axy cannot be increased by partial sampling (nor can the maximal 
degree observed) . In fact the actual kmax is thus at least equal to the one found by 
a sampling of the true network. 

Since more central cores are more connected, and more paths go through them, 
path-based sampling should intuitively discover and sample better more central 
cores, while the peripherical shells could suffer from stronger biases. In order to 
check such ideas, we perform a traceroute-like probing of the various model net- 
works considered in section 221 and compare their fc-core decomposition before and 
after sampling. We use the same model for traceroute as in [15l [TBI [26] : a- set of 
Ns sources sends probes to Nt destinations randomly placed on the network, and 
the shortest paths between the source-destination pairs are merged to compose the 
sampled network. We use Ng = 50 sources, and various probing efforts measured 
by e = NsNt/N (where N is the size of the initial network), from a small value 
e = 0.1 (corresponding to a small density of targets Nt/N = 2.10^'^) to a much 
larger e — 5 (relatively large density of targets Nt/N = 10^^). 

Figure [T2| presents the curves of the fc-shell size as a function of the index for 
various network models and various sampling efforts. For ER networks, the popu- 
lated shells change from being at index values only slightly under k = (d) to much 
smaller values, with an almost uniform population of shells. The observed behavior 
is therefore completely different from the one observed in AS maps. On the con- 
trary, the power-law shape obtained for RSF or BRITE networks, and comparable 
to the one of the AS maps, is very robust, even if the slope is affected. Indeed, 
shells of smaller indices are less well sampled. In particular, the size of the first 
shell is most strongly decreased by the sampling procedure; in some cases in fact, 
the first shell is larger than the second in the original network, but becomes smaller 
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Figure 12. Plot of the size of the fc-shells vs. k for various models, 
before and after traceroute-like sampling, with different probing 
efforts e. The qualitative shapes are preserved by sampling. 



in the sampled network. We note that in the available AS maps, the first shell is 
indeed typically smaller than the second, and that the true AS network thus very 
probably exhibits a much larger shell of index k — 1. Similarly, one can expect that 
the exponent close to 2.7 of the power-law behavior of the shell size vs. its index 
(see [12] and Fig. [2]) is a lower bound and that such value might be reconsidered 
in the future thanks to more and more extensive sampling efforts. On the other 
hand, the fact that the shell of largest index is substantially larger than the ones 
with immediately lower indices is well preserved, even if its index is substantially 
decreased by the fact that many edges are ignored during the sampling process. 

Figures [13] and [14] moreover show that the self-similar properties of the fc-core 
decomposition are preserved by the sampling process. Although the precise form of 
the degree distribution of the whole network is slightly altered, the basic correlation 
properties are conserved by the sampling. Moreover, the self-similar structure of 
the fc-core decomposition is also preserved, as a comparison of Fig.s[T51 and [Til with 
Fig.s [8] and [9] clearly shows. 

While the main statistical properties of the /c-core decomposition are therefore 
largely conserved by the sampling process, allowing to distinguish between networks 
with different topological structures, important quantitative biases can appear and 
compromise the accuracy of the measurements, as we now investigate. In order 
to understand such effects in more details, we indeed show in Fig.s [TS] and [TH] the 
probability for a vertex of given shell index in the original network to have another 
shell index in the sampled network, in the case of an original network obtained 
by the BRITE generator. At low sampling effort, many vertices are simply left 
undiscovered, and the shell index properties can be strongly affected in a seemingly 
erratic way, as shown by the important scattering of data in Fig. 1151 As soon 
however as the sampling effort is increased to a more reasonable level, a strong 
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Figure 13. Nearest neighbors degree distribution of some /c-cores, 
rescaled by the corresponding average values, for some network 
models after sampling through a traceroute-like process with 
Ns = 50 sources and target density Nt/N = 0.1. The data col- 
lapse shows that the self-similarity is preserved by sampling. 
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Figure 14. Clustering spectrum of some fc-cores, rescaled by the 
corresponding average values, for some network models after sam- 
pling through a traceroute-like process with Ns = 50 sources and 
target density iVr/A^ = 0.1. 



correlation appears between the true shell index and its value in the sampled graph, 
even if a systematic downwards trend is observed (Fig. fTB]) . 

In summary, our results indicate that the sampling biases do in fact affect only 
slightly the measure of the statistical properties of heterogeneous graphs and of their 
fc-core decomposition, even at relatively low level of sampling. In fact, the routing 



20 J. I. ALVAREZ-HAMELIN AND A. BARRAT AND L. DALL'ASTA AND A. VESPIGNANI 




10 20 30 40 50 

Original shell index 



Figure 15. The grayscale code gives the probabihty of a change 
in shell index due to the traceroute-like sampling, from a certain 
index before sampling [x axis) to another one after sampling [y 
axis). The line at y = represents the probability of vertices of shell 
index x to be absent from the sampled graph. The initial network 
is obtained by the BRITE generator. Here Ns = 50 sources and 
a fraction Nt/N — 2.10^'^ of targets are used. The low sampling 
effort implies that many nodes are not discovered, and that the 
measured shell index can differ strongly from the original one. 

properties as "measured" by the shell indices will be in fact rather underevaluated 
due to the incomplete sampling of edges, which can be taken as a rather good news 
showing that the AS network probably offers better performance (QoS, robustness) 
than what can be measured by the present maps. 

6. Conclusions. We have presented the application of the fc-core decomposition 
to the analysis of large scale networks models and of large scale Internet maps. The 
fc-core decomposition allows the progressive pruning of the networks and the iden- 
tification of subgraphs of increasing centrality. These subgraphs have the property 
of being more and more densely connected, and therefore of presenting more and 
more robust routing capabilities. The study of the obtained subgraphs uncovers the 
main hierarchical layers of the network and allows for their statistical characteriza- 
tion. Strikingly, we observe for the Internet at the Autonomous System a statistical 
self-similarity of the topological properties for cores of increasing centrality. 

The k-core decomposition proves useful to uncover not only the hierarchical de- 
composition of real maps, but also for model validations. For example, many mod- 
els, although having, e.g. degree distribution and clustering properties similar to 
those of real maps, do not present shell index values as large as the real data, 
nor a similar structure in which each fc-core is composed by a constant fraction 
of the k — 1-core. The fc-core decomposition should therefore be considered as a 
.supplementary valuable tool for network characterization and model validation. 

It is also worth mentioning that the router level fc-core structure of the Internet 
appears to have different properties than those appearing at the AS level [3H [55] . 
This calls for repeating the present analysis for different router level maps available 
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Figure 16. Same as Fig. [15] for Ns = 50 sources and Nt/N = 
2.10~2 (top) and Nt/N = 10"^ (bottom). As the sampling effort 
is increased, the measured and the original shell index become more 
correlated. 



at the moment in order to better emphasize the structural difference exhibited by 
the two different mapping granularities. 

Moreover, the fc-core analysis allows to compare maps obtained by different map- 
ping processes, follow their temporal evolution and assess the stability of these maps. 
It also appears as an interesting way of discriminating between various topologies, 
even after sampling biases have been introduced: for example, a sampled ER net- 
work may display a power-law like degree distribution in case of a very limited 
sampling effort, but its fc-core decomposition will in any case remain very different 
from the one of sampled heterogeneous networks. 

Finally, the fc-core decomposition may be used also to define a computational 
feasible centrality measure and a hierarchy between the nodes of a network. It 
combines the degree ranking with more global structural properties, connectedness 
and routing capabilities, providing a centrality measure that is highly correlated 
with the various standard definitions such as degree and betweenness centrality. 

In conclusion, the fc-core decomposition appears at a general level as a very inter- 
esting and useful additional tool for analysis of complex networks, with particular 
relevance in the context of technological and communication networks. 
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